Chapter 3: Theories, hypotheses, and comparisons

Reminders

  • Workbook homework due on Thursday

  • Pilot survey is in the works, expect to see a link next week. Then you’ll need to take the survey yourself and provide feedback on the questions.

    • I’m trying to consolidate, so you might not see your exact question, but you will hopefully see something that approximates it.

Goals

  • Begin looking at relationships between variables

  • Theoretical Arguments and Hypotheses

  • Elements of a good theory

  • Writing good hypotheses

  • Testing Hypotheses

(material here comes mostly from Chapter 3 of your textbook)

The goals of research

  • Describing: making generalization about the world

  • Predicting: generating expectations about what will happen in the future

  • Explaining: explaining why things are related.

Explanation is often the toughest to achieve, but also the most desirable because it allows us to do things like make changes to reach a desired outcome.

“Why” questions

  • “Why questions”

    • Why do some eligible voters fail to turn out on election day?
    • What explains variation in gun policies across U.S. states?
    • Why did communist revolutions happen in China and Russia but not Europe or the United States?

Theories

“a logically interconnected set of propositions from which empirical uniformities can be derived” – Robert K Merton

Theories are explanations, assumptions, claims and narratives that provide a set of expectations that link a cause to an effect.

Purely descriptive or predictive analyses don’t necessarily require a theory, but its a key component of explanatory research.

Theoretical Scope

Theories vary in their scope:

  • Early sociologists like Marx, Durkheim, Weber attempted to develop all-encompassing “laws” of political/social/historical change. These are sometimes called “grand theories”
  • Contemporary social sciences are less ambitious, and so its more common to propose “middle-range” theories that seek to explain a smaller number regularities in one area.
    • However, they may draw on “grand theories” either implicitly or explicitly.

Paths to Modernity

What explains differences in the “path to modernity” across different countries during the 20th century?

  • Free markets/Democracy in the U.S. and England

  • Fascism in Germany and Japan

  • Communism in Russia and China

Paths to Modernity

Barrington Moore: Classes have unique and conflicting interests. Conflicts over these interests come to the forefront during industrialization. The outcomes of these class conflicts shape the political and economic system.

  • Fascist states emerge when the landed aristocracy wins

  • Communist states emerge when the peasant class wins

  • Democracies emerge when the bourgeois (middle class) wins.

Paths to Modernity

  • Moore’s theory borrows assumptions from a (sociological) Marxist grand theory about class conflict

  • He “tests” it by showing how it fits the selected cases.

  • It can be used and refined to generate a set of empirical expectations about what factors should matter for democratization. For instance, we might expect:

    • States with larger agricultural sectors during industrialization to be less democratic today (compared to states with smaller agricultural sectors)

    • States with higher literacy rates during industrialization to be more democratic (compared to states with lower literacy rates)

Good Theories

  • (For our purposes) theories are statements about cause and effect and causal mechanisms—If X happens, then Y will follow as a result because….
  • Good theories clearly identify:

    • A dependent variable(s): the outcome to be explained

    • One or more independent variables: the causal factors that determine the DV.

    • A causal mechanism that links these two things.

    • An expectation about the direction of the effect (positive, negative, something more complex)

  • Most social science theories are probabilistic instead of deterministic. So we’ll speak in terms of more/less likely or higher/lower.

  • Good theories should generate expectations that can be empirically tested (even if the theory itself can’t be tested)

Theories: survey answers

What explains inconsistent answers to survey questions?

  • Asking the same people the same questions a few months apart yields surprisingly unpredictable results.

  • Small changes in question wording, ordering of choices, or survey context cause big changes in outcomes

Data from the 1980 ANES panel survey (reproduced from Zaller 1992)

Theories: RAS model

  • Recieve-Accept-Sample model (Zaller)

    • Receive: People hear persuasive messages

    • Accept: They accept some of these and reject others depending on their predispositions.

    • Sample: When they answer a survey, they “sample” from the top-of-mind considerations.

  • outcome: attitude stability, attitudes

  • causes: the volume and clarity of messages (especially from elites)

  • (some) Expectations:

    • More engaged people are more persuadable when partisan cues are low

    • People with low engagement/knowledge will develop more consistent answers issues are highly salient

Theories: voting

Why do people vote?

  • Since politicians generally offer public goods, you can enjoy the benefits of your preferred candidate winning even if you don’t vote

  • Since voting has costs (even though they’re small) free riding can be preferable to actually turning out if the costs outweigh the benefits.

Theories: voting

Pivotal voting

  • Claim: people vote because they expect to sway the election

  • If this is true, then:

    • Turnout should be higher in close elections

    • Turnout should be higher when the electorate is small

    • Turnout will be higher in PR systems where one vote matters more.

Expressive voting model

  • Claim: people vote to enjoy the expressive benefits

  • If this is true then:

    • People with more extreme beliefs will be more likely to vote

    • Closeness or the size of the electorate shouldn’t matter much

Its unlikely that either theory is entirely true, but both can be used to generate expectations and productive debate over the relative weight of the evidence favoring one explanation vs. another.

Assessing Theories

  • Theories require simplified representations of a complex reality

  • Utility, not “truth”: theoretical models invariably contain assumptions and they’re probably violated in practice.

George Box: “All models are wrong, but some are useful”

George Box: “All models are wrong, but some are useful”

Assessing Theories

  • Consistency
    • Is our theory internally consistent? Does it have a clear logic?
  • Empirical accuracy
    • Do the theories help us understand the world?
    • Are observed realities consistent with the expectations our theory generates?
    • Can we use a theory to make useful predictions about future events?
    • Can the theory adapt in the face of inconsistent findings? (this will happen!)
  • Theories may be more or less useful in certain contexts
    • There’s stronger evidence for the pivotal voter model in small elections and referenda, but other models do a better job explaining national level turnout

Hypotheses

  • Theories give causal explanations for why something effects something else

  • Hypotheses are specific testable implications generated by that theory.

    • Theory: people vote because of expressive benefits

    • Hypothesis: people with more extreme views will be more likely to turn out.

Hypotheses

Components:

  • Unit of analysis

  • Dependent variable

  • Independent variable

  • Direction of the predicted relationship

Good hypotheses inevitably involve comparative language (higher/lower/more/less/increase/decrease/better/worse)

Hypothesis Template

In a comparison of [unit of analysis], those having [one value on the independent variable] will be [more/less] likely to have [one value on the dependent variable] than those having a [different value on the independent variable].

Hypothesis Template

In a comparison of [voters], those having [stronger political views] will be more likely to have [a higher likelihood of turnout] than those having a [weaker views].

  • Unit of analysis: voters

  • IV: strength of political views

  • DV: turnout

  • Relationship: strength increases turnout

Hypothesis Template

In a comparison of [states], those having [a larger middle class during industrialization] will be more likely to have [democracy] than those having a [a smaller middle class].

  • Unit of analysis: states

  • IV: size of the middle class

  • DV: democracy

  • Relationship: middle class increases likelihood of democracy

Hypothesis Template

In a comparison of [survey respondents], those having [higher levels of attention to politics] will be more likely to have [consistent responses] than those having a [lower levels of attention to politics].

  • Unit of analysis: Survey respondents

  • IV: level of attention

  • DV: response consistency

  • Relationship: attention increases consistency

Complex Relationships

Good hypotheses may suggest a more complex set of relationships than just “positive/negative”. They could propose conditional/interactive/curvilinear relationships as well.

The “oil curse”

In a comparison of [countries], those having [higher levels of GDP] will be [more likely to be democratic] compared to [countries with lower GDP], [however, this relationship will not hold for countries that get rich from oil exports.]

Retrospective voting:

In a comparison of [voters], those having [lower levels of attention to politics] will be [more likely to vote for the incumbent when the economy is doing well]. Those having [higher levels of attention to politics] will be [more likely to vote based on policy preferences regardless of the state of the economy]

Bad hypotheses

  • The main determinant of war is the distribution of power in the international system.

  • In comparing individuals, annual income and the level of education are related.

  • Democracies are peaceful. In comparing individuals, some people are more likely to favor the death penalty than others.

Next

  • Testing hypotheses by making comparisons

  • Graphing and describing relationships

Testing Hypotheses

Cross tabulation

Assuming we have a categorical independent variable (IV) and a categorical dependent variable (DV):

iv dv
HIGH No
HIGH No
LOW No
HIGH Yes
HIGH Yes
HIGH Yes
HIGH Yes
LOW Yes
LOW Yes
LOW Yes

Cross tabulation: Step 1

Start by calculating the number of observations with each value of each category:

iv dv
HIGH No
HIGH No
LOW No
HIGH Yes
HIGH Yes
HIGH Yes
HIGH Yes
LOW Yes
LOW Yes
LOW Yes
iv
dv LOW HIGH
No 1 2
Yes 3 4
Total 4 6

Cross tabulation: Step 2

Then, calculate the proportion/percentage of observations among each value of the IV.

If the independent variable is in the columns, then the columns should sum to 100%.

If the independent variable is in the rows, then the rows should sum to 100%.

iv
dv LOW HIGH
No 1 2
Yes 3 4
Total 4 6
iv
dv LOW HIGH
No 1 (25%) 2 (33%)
Yes 3 (75%) 4 (67%)
Total 4 6

Cross tabulation: interpretation

Look at what happens to the DV at different values of the IV. If your variables are ordinal, you should be able to identify a direction of the effect.

The proportion of “Yes” values decreases as the IV goes from lower to higher, so this is a negative or inverse relationship.

iv
dv LOW HIGH
No 1 (25%) 2 (33%)
Yes 3 (75%) 4 (67%)
Total 4 6

Using a bar graph or line graph can make these relationships easier to spot:

Cross tabulation: notes

  • Key rule: always calculate percentages or proportions by categories of the independent variable.

    • This allows you to compare groups that are different sizes.
  • If one or both variables are interval-level, you can bin them in order to use them in a cross tab. For instance, you could separate an interval like into a series of age ranges.

Cross tabulation: example

Hypothesis: in a comparison of individuals, independents are less likely to turn out to vote compared to people who support one party or another.

How should I calculate proportions here?

Voter Turnout in 2020 by party ID
Party ID
turnout2020 Democrat Independent Republican
0. Did not vote 335 316 382
1. Voted 3160 560 2714

Cross tabulation: example

Are these results generally consistent with my hypothesis?

Voter Turnout in 2020 by party ID
Party ID
turnout2020 Democrat Independent Republican
0. Did not vote 335 (10%) 316 (36%) 382 (12%)
1. Voted 3160 (90%) 560 (64%) 2714 (88%)

If we think of party ID as an ordered variable, this is a curvilinear relationship.

Row/Column percentages

What happens if I calculate % among the values of the DV?

Here’s the relationship between education and voter turnout with % calculated on education level:

Voter Turnout in 2020 by highest level of education completed
Education
turnout2020 1. Less than high school credential 2. High school credential 3. Some post-high school, no bachelor's degree 4. Bachelor's degree 5. Graduate degree
0. Did not vote 130 (41%) 286 (24%) 380 (15%) 135 (7%) 91 (6%)
1. Voted 185 (59%) 883 (76%) 2148 (85%) 1749 (93%) 1388 (94%)
Note:
Column % in parentheses

The results suggest a positive or direct relationship: as education increases, so does the % turnout.

Row/Column percentages

What happens if I calculate % among the values of the DV?

Here’s the relationship between education and voter turnout with % calculated across voter turnout

Voter Turnout in 2020 by highest level of education completed
Education
turnout2020 1. Less than high school credential 2. High school credential 3. Some post-high school, no bachelor's degree 4. Bachelor's degree 5. Graduate degree
0. Did not vote 130 (13%) 286 (28%) 380 (37%) 135 (13%) 91 (9%)
1. Voted 185 (3%) 883 (14%) 2148 (34%) 1749 (28%) 1388 (22%)
Note:
Row % in parentheses

Here, the results can give the misleading impression that there’s a curvilinear relationship: turnout drops off for Bachelor’s Degrees and above.

Row/Column percentages

Either of these tables might be a valid way to look at these data, but they answer slightly different questions:

  • If I want to compare turnout at different levels of education, then I need to calculate % turnout among people with different levels of education.

  • If I want to compare education among voters and non-voters, then I need to calculate % education among people who voted and didn’t vote.

  • Which variable is the IV or DV is sometimes a theoretical question, but in this case its unlikely that voting is causing people to become more educated, so it probably doesn’t make sense to calculate percentages by voting vs. non-voting.

Mean Comparison

When we have interval level outcome and a categorical independent variable, we can group each observation by values of the IV and then calculate the mean across each group.

For instance I want to examine the relationship between national wealth and carbon emissions. My hypothesis is that wealthier nations will have more emissions compared to poorer nations.

GDP data has been grouped into five categories, so now I just need to calculate the average of CO2 emissions within each group of the IV.

country gdp.percap.5cat co2.percap
Afghanistan 1. $3k or less 0.281803
Albania 3. $10k to $25k 1.936486
Algeria 3. $10k to $25k 3.988271
Angola 2. $3k to $10k 1.194668
Argentina 3. $10k to $25k 3.995881
Armenia 3. $10k to $25k 2.030401
Australia 5. $45k or more 16.308205
Austria 5. $45k or more 7.648816
Azerbaijan 3. $10k to $25k 3.962984
Bahrain 5. $45k or more 20.934996

Mean Comparison

GDP Per capita range CO2 emissions per capita
1. $3k or less 0.3128312
2. $3k to $10k 1.2680574
3. $10k to $25k 4.4065669
4. $25k to $45k 8.0307610
5. $45k or more 12.3134306

Mean Comparison

Mean Comparison